Training and Evaluation of a Spoken Language Understanding System
نویسندگان
چکیده
Introduct ion This paper describes our results on a spoken language application for finding directions. The spoken language system consists of the MIT SUMMIT speech recognition system ([20]) loosely coupled to the UNISYS PUNDIT language understanding system ([9]) with SUMMIT providing the top N candidates (based on acoustic score) to the P U N D I T system. The direction finding capability is provided by an expert system which is also part of the MIT VOYAGER system [18]). 1 One major goal in this research has been to understand issues of training vs. coverage in porting a language understanding system to a new domain. Specifically, we wished to determine how much data it takes to train a spoken language system to a given level of performance for a new domain. We can use the answer to this question in the process of designing data collection tasks to decide how much data to collect. We address a related question, that is, how to quantify the growth of a system as a function of training, in [12]. To explore the relationship of training to coverage, we have developed a methodology to measure coverage of unseen material as a function of training material. Using successive batches of new material, we assessed coverage on a batch of unseen material, then trained on this material until we reached a certain level of coverage, then repeated the experiment on a new batch of material. The system coverage seemed to level off at about 70% coverage of unseen data after 1000 sentences of training data. A second goal was to develop a methodology for automatically tuning a broad-coverage grammar to a new application domain. This approach avoids repeating domain independent grammar development work over again for each new domain. To do this we developed a method for deriving a minimal grammar and a minimal
منابع مشابه
On-Line Learning of a Persian Spoken Dialogue System Using Real Training Data
The first spoken dialogue system developed for the Persian language is introduced. This is a ticket reservation system with Persian ASR and NLU modules. The focus of the paper is on learning the dialogue management module. In this work, real on-line training data are used during the learning process. For on-line learning, the effect of the variations of discount factor (g) on the learning speed...
متن کاملOn-Line Learning of a Persian Spoken Dialogue System Using Real Training Data
The first spoken dialogue system developed for the Persian language is introduced. This is a ticket reservation system with Persian ASR and NLU modules. The focus of the paper is on learning the dialogue management module. In this work, real on-line training data are used during the learning process. For on-line learning, the effect of the variations of discount factor (g) on the learning speed...
متن کاملThe Impact of Language Learning Activities on the Spoken Language Development of 5-6-Year-Old Children in Private Preschool Centers of Langroud
The Impact of Language Learning Activities on the Spoken Language Development of 5-6-Year-Old Children in Private Preschool Centers of Langroud N. Bagheri, M.A. E. Abbasi, Ph.D. M. GeramiPour, Ph.D. The present study was conducted to investigate the impact of language learning activities on development of spoken language in 5-6-year-old children at private preschool center...
متن کاملCore Units of Spoken Grammar in Global ELT Textbooks
Materials evaluation studies have constantly demonstrated that there is no one fixed procedure for conducting textbook evaluation studies. Instead, the criteria must be selected according to the needs and objectives of the context in which evaluation takes place. The speaking skill as part of the communicative competence has been emphasized as an important objective in language teaching. The pr...
متن کاملStochastic language models for speech recognition and understanding
Stochastic language models for speech recognition have traditionally been designed and evaluated in order to optimize word accuracy. In this work, we present a novel framework for training stochastic language models by optimizing two different criteria appropriate for speech recognition and language understanding. First, the language entropy and salience measure are used for learning the releva...
متن کاملMulti - Site Data Collection and Evaluationin Spoken Language
The Air Travel Information System (ATIS) domain serves as the common task for DARPA spoken language system research and development. The approaches and results possible in this rapidly growing area are structured by available corpora, annotations of that data, and evaluation methods. Coordination of this crucial infrastructure is the charter of the Multi-Site ATIS Data COllection Working group ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1990